Skip to content

docs(education): add eval-driven-development examples page#716

Open
justinmclean wants to merge 1 commit into
apache:mainfrom
justinmclean:education-eval-driven-development
Open

docs(education): add eval-driven-development examples page#716
justinmclean wants to merge 1 commit into
apache:mainfrom
justinmclean:education-eval-driven-development

Conversation

@justinmclean

Copy link
Copy Markdown
Member

Summary

Creates docs/education/eval-driven-development.md: how to think about correctness when "correct" is a distribution, with four worked examples drawn from real Magpie skills (issue-triage classification, prompt-injection resistance, prose grading with a judge model, and structural assertions for multi-field output). Wired to the framework's shared eval harness (tools/skill-evals/) rather than a parallel approach.

Also creates docs/education/README.md — the landing-page index for the maintainer-education stream — with eval-driven-development.md as the first stable link and the remaining stream pages (pattern-catalogue, your-first-skill, workshops) listed as Planned until their own branches land.

Generated-by: Claude (Opus 4.7)

Type of change

  • Skill change (.claude/skills/<name>/) — eval fixtures updated below
  • Tool / bridge contract (tools/<system>/*.md)
  • Python package (tools/*/ with pyproject.toml)
  • Groovy reference impl
  • Cross-cutting (RFC, AGENTS.md, sandbox, privacy-LLM)
  • Documentation (docs/, README.md, CONTRIBUTING.md)
  • Project template (projects/_template/)
  • CI / dev loop (prek, workflows, validators)
  • Other:

Test plan

  • prek run --all-files passes
  • For Python packages touched: uv run pytest / ruff check / mypy passes
  • For Groovy bridges touched: command-line invocation tested end-to-end
  • For skill changes: eval suite passes for the affected skill
    (PYTHONPATH=tools/skill-evals/src python3 -m skill_evals.runner tools/skill-evals/evals/<skill>/)
  • For skill behaviour changes: a new or updated eval fixture is included in this PR
    (a regression test for the bug fixed / the behaviour added — see CONTRIBUTING.md)
  • Other:

Creates docs/education/eval-driven-development.md: how to think about
correctness when "correct" is a distribution, with four worked examples
drawn from real Magpie skills (issue-triage classification, prompt-injection
resistance, prose grading with a judge model, and structural assertions for
multi-field output). Wired to the framework's shared eval harness
(tools/skill-evals/) rather than a parallel approach.

Also creates docs/education/README.md — the landing-page index for the
maintainer-education stream — with eval-driven-development.md as the first
stable link and the remaining stream pages (pattern-catalogue, your-first-skill,
workshops) listed as Planned until their own branches land.

Generated-by: Claude (Opus 4.7)
@justinmclean justinmclean self-assigned this Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant